POLS 2972Q

Quantitative Analysis in Political Science

Lecture 2 | Introduction to R and RStudio

If You are Worried Because…

  • You have never done any statistics or coding…
    • Don’t worry, I assume you haven’t
  • Math is your “mortal enemy” or “you are not good at it”…
    • We will start from zero and progress slowly
    • There is plenty of help available
    • As long as you put in the time to do the work every week, you will do great!
  • You don’t think you will be able to “memorize” all the equations …
    • You really shouldn’t have to memorize anything in this class

Plan for Today

  • Become familiar with R and RStudio
  • Become familiar with R
    • Do calculations: +, -, \(*\), /
    • Create objects: \(<-\),
    • Use functions: (), sqrt(), #

R and RStudio

  • Recall, before class today you should have installed two programs in your computer:
    • R () and RStudio ()
  • R is the statistical program that will perform calculations and create graphics for us (it’s the engine)

  • RStudio is the user-friendly interface that we will use to communicate with R

  • We will never open R directly; we will always start by opening RStudio
    • RStudio will open R by itself

RStudio

  • Go ahead and open RStudio ()
  • Then, open a new R script:
    • drop down menu: File > New File > R Script
  • What is a R script?
    • type of file we use to store the code we write to analyze data

RStudio Layout

RStudio Layout

  • R Script (upper left window):
    • Where we write and execute code
  • R Console (lower left window):
    • where R provides the executed code and its outputs (including errors)
  • Environment (upper right window):
    • Storage room of current R session
    • Lists objects that we have created
  • Help and Plots tabs (lower right window)

The R Programming Language

  • To use R, we need to learn its language
    • the R programming language
    • R is both the name of the program and the name of the language
  • Learning a programming language is like learning a foreign language
    • not easy
    • requires practice
    • requires patience

We will use R to:

  • Do calculations
  • Create objects
  • Use functions

Calculations

  • We can use R as a fancy calculator
    • R understands arithmetic operators such as +, -, \(*\), /
  • Let’s ask R to calculate 20 plus 5
  • First we type on the R script (upper left window): 20 + 5
  • Then, to execute this code: we highlight it and either
    1. Manually hit the run icon ()
    2. Use the shortcut command + enter in Mac or ctrl + enter in Windows
  • Go ahead and do it
  • In the console, you should see the following:
20 + 5
[1] 25


  • First, the executed code 20 + 5
    • Then, the output of the executed code ## [1] 25
  • What does the [1] mean?
    • It indicates that the output immediately to its right is the first (and only, in this case) output
  • The title of the R script is now highlighted in a different color to indicate that you have unsaved changes
    • To save the R script either use shortcuts (command + S or ctr + S) or click on File > Save (or Save As…)
    • Name it “lecture2” so that you know what it refers to
  • In our textbooks, as well as in our lectures, you will probably see the output that you should see in the console right after the code that produces it, like so:
    • 20 + 5
    • ## [1] 25
  • 20 + 5 is the code to be typed and executed on the R script (the code that R will execute)
  • The symbol ## indicates the beginning of the output
  • [1] 25 is the output (what you should see in the console after the executed code)

Create Objects

  • R Stores information in the form of objects
  • In order to analyze data, we will need to create objects
  • An object is like a box that can contain anything
    • To create an object, we need to:
      • Give it name
      • Specify its contents
      • Use the assignment operator \(<-\)

In R, we use the assignment operator \(<-\) to create an object:

  • To its left, we specify the name of the object
    • A name cannot begin with a number, contain spaces, or special symbols (i.e., $ or %) that are reserved for other purposes
    • A name can contain _ underscores, which are good substitutes for spaces
  • To its right, we specify the content of the object


object_name \(<-\) object_content
object_name \(<-\) object_content


  • For example, type and run:
object_name twentyfive \(<-\) 25 object_content


  • After executing this code, the object twentyfive will show up in the Environment (the upper right window of RStudio)
  • To find out the contents of an object, you can run the name of the object in R:
twentyfive <- 25

twentyfive
[1] 25
  • Objects can contain text as well as numbers
  • Execute: class \(<-\) “pols2972Q”
  • Now in the environment there should be two objects
    • What are they?
  • Note that in this last piece of code we used around the contents, but we did not use in the previous piece of code
twentyfive \(<-\) 25 vs class \(<-\) “pols2972Q”


  • Why?

When do we need to use when writing code in R?

  • the names of objects, names of functions, and names of arguments as well as special values such as TRUE, FALSE, NA, and NULL should NOT be in quotes
  • All other text should be in quotes

  • Numbers should never be in quotes unless you want R to treat them as text

  • What would happen if you executed: class \(<-\) pols2972Q

    • You will receive an error: Error: object ’pols2972Q not found

      • without the , R thinks that pols2972Q is the name of an object and R is right; there is no object called pols2972Q in the environment
  • Running into errors is part of the coding process
    • Do not be discouraged
    • If you have problems figuring out what a particular error means, Google it; there are lots of Q&A sites
      • Copy and paste the error directly into Google
  • R will overwrite objects if you assign new content to an existing object name
class <- "pols2972Q"

class
[1] "pols2972Q"


class <- "data analysis"

class
[1] "data analysis"
  • Notice that class now contains the text “data analysis” instead of “pols2972Q”
  • R is case sensitive
    • class is different than Class
    • To avoid confusion, try to utilize lower-case letters when naming objects

Use Functions

  • Think of a function as an action that you request R to perform on a particular object or piece of data, such as calculating the square root of 25


  • A function:
    • takes input(s)
      • example: takes the number 25
    • Performs an action with the input(s)
      • computes \(\sqrt{25}\)
    • Produces an output
      • Produces the number 5
  • We will learn how to use these functions and others throughout this semester:
    • sqrt(), setwd(), read.csv, View(), head(), dim(), mean(), ifelse(), table(), prop.table(), ggplot(), median(), lm(), print(), and summary()
  • In time, we will learn:
    • Their names
    • The actions they perform
    • The inputs they require
    • The outputs they produce
  • The name of a function (without quotes) is always followed by parentheses: function_name()
  • Inside the parentheses, we specify the inputs, which we refer to as arguments: function_name(arguments)
  • Most functions require that we specify at least one argument but can take many optional arguments
    • some arguments are required, others are optional
  • When multiple arguments are specified inside the parentheses, they are separated by commas: function_name(argument1, argument2)
  • To specify the arguments, we enter them in a particular order or include the name of the argument (without quotes) in our specification:
    • function_name(argument1, argument2)
    • function_name(argument1_name = argument1, argument2_name = argument2)
  • We always specify required arguments first
  • If there is more than one required argument, we enter them in the order expected by R
  • We specify any optional arguments we want next and include their names:
    • function_name(required_arguments, optional_argument_name = optional_argument)

Function Example

  • Suppose R were capable of baking and that it had a function named bake() that, by default, bakes the specified ingredient for 60 minutes at 400F
  • Required argument = the ingredient
    • Our ingredient is cake mix
  • Optional arguments: named degrees and minutes to change the default temperature and duration of the bake, respectively
    • degrees = 350 which changes the temperature to 350F
    • minutes = 30 which changes the duration of bake to 30 minutes
  • The following code would ask R to bake a cake mix for 30 minutes at 350F, so that we can have cake as the output:

Function Example Part Deux

  • sqrt() computes the square root of the argument specified inside the parentheses. To compute \(\sqrt{25}\), run:
sqrt(25)
[1] 5
  • sqrt is the name of the function, which, as all function names, is followed by parentheses ()

  • 25 is the required argument

  • 5 is the output

  • Alternatively, since the object twentyfive contains the number 25, we can run:
sqrt(twentyfive)
[1] 5


  • R will give you an error message if you run this line of code before creating the object twentyfive

  • Code is sequential! One must run code in order

    • Whenever returning to work on an R script, run all the code from the beginning

Good Coding Practices

  • It is good practice to comment code
    • Include short notes to yourself or your collaborators explaining what the code does
  • To comment code, we use #
    • R ignores everything that follows this character until the end of the line
#calculate the square root
sqrt(25)
[1] 5
sqrt(25) #calculates square root of 25
[1] 5
  • Before closing your computer, remember to save the R script, otherwise you risk losing unsaved changes
    • Either use shortcuts (command + S or ctr + S)
    • Click on File > Save
  • If you quit RStudio, R will ask whether you want to save the work space image, which contains all the objects you have created during the R session
    • I recommend that you do not save it
    • You can always re-create the objects by re-running the code in your R script

For Next Class

  • Review what you learned today

  • Do new set of readings:

  • DASS Chapter 1.7-1.10

    • Follow along the exercises with your own computer
  • Bring your laptops to class